Hi!
We are Aviv, Kesem and Nitzan, 2nd year of master’s degree in social cognitive psychology.
Here we present our final project at Information Visualization course at Ben-Gurion University of the Negev, semester a 2022.
I.D.s:
Aviv Mokady: 203970249
Kesem Adi: 302585823
Nitzan Zilberman: 315604710
Our data was retrieved from link, link, link. It describes the grades in the Bagrut exams across high schools in Israel.
We had four different data frames, one for 2013-2016, and for each year 2017, 2018, 2019.
Each row is an observations of an exam grade, while each column represents a characteristic of the school or the exam.
2013 – 2016
Contain eight variables: Grades, Number of students, Number of Yahal (number of study unites), Graduation year, Subject, School name, School ID.
The data consists of observations for 69,638 exams in 315 cities, 976 schools and 118 subjects.
2017
Contain eleven variables: eight variables similar to the 2013-2016 dataset and three additional variables: supervision, sector, and district.
The data consists of observations for 14,184 exams in 306 cities, 1,063 schools and 110 subjects.
2018
Contain eleven variables, the same as in 2017.
The data consists of observations for 14,975 exams in 311 cities, 1,020 schools and 109 subjects.
2019
Contain eleven variables, the same as in 2017 and 2018.
The data consists of observations for 15,192 exams in 320 cities, 1,060 schools and 108 subjects.
Combining
The Final combined dataset consists of observations for 113,989 exams in 336 cities, 1,174 schools and 130 subjects.
Coordination
We knew we wanted to present some of the information on a map, to do so we searched for the coordinates of the school and the cities in our data.
The cities were taken from here and the schools taken from here. We then combined this information with our data to create our final and complete dataframe for this project.
| City | school_id | students | yahal | grad_year | subject | school_name | grades | district | sector | supervision | school_x | school_y | city_x | city_y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| אבו גוש | 148080 | 25 | 5 | 2018 | מדעי לימודי הסביבה | מקיף אבו גוש | 74.48 | ירושלים | ערבי | כללי | 35.10649 | 31.81075 | 3908447 | 3737870 |
| אבו גוש | 148080 | 14 | 4 | 2014 | מתמטיקה | מקיף אבו גוש | 73.79 | ירושלים | ערבי | כללי | 35.10649 | 31.81075 | 3908447 | 3737870 |
| אבו גוש | 148080 | 34 | 5 | 2015 | מדעי לימודי הסביבה | מקיף אבו גוש | 76.79 | ירושלים | ערבי | כללי | 35.10649 | 31.81075 | 3908447 | 3737870 |
| אבו גוש | 148080 | 13 | 5 | 2015 | כימיה | מקיף אבו גוש | 75.15 | ירושלים | ערבי | כללי | 35.10649 | 31.81075 | 3908447 | 3737870 |
| אבו גוש | 148080 | 26 | 5 | 2016 | מדעי לימודי הסביבה | מקיף אבו גוש | 76.96 | ירושלים | ערבי | כללי | 35.10649 | 31.81075 | 3908447 | 3737870 |
| אבו גוש | 148080 | 20 | 4 | 2015 | מתמטיקה | מקיף אבו גוש | 68.65 | ירושלים | ערבי | כללי | 35.10649 | 31.81075 | 3908447 | 3737870 |
Haaretz newspaper reported the Bagrut grades for 2010-2014. You can find the original article here.
City - Subject
In the article, they show an interactive graph in which you can select the city and the subject and it will present the trend of the grades along the years. The graph alows you to compare different cities and choose a specific year to receive more information about.In general this visualization is an example of a well done visualization, but it could be improved. We think instead of having the years as a separate choice above the graph, this could have been done by making the points along the line chart interactive such that clicking on them will change the left information window.
Big cities - Subject
Another graph from the same Haaretz article allows to select a subject and a year and shows you the grades in the big cities.IN contrast to the first visualization, we think this one is not as good. It looks nice, but where are the grades? The grades do not appear on the graph and the dashed lines not indicative.
Moreover, why were those cities were chosen? Tel-Aviv is one of the biggest but it’s not here. Additionally, when choosing different subjects or years, the ploted citeis change with no good explanation to how they are being chosen.
We also recommend to highlight the overall mean grades, right now you can’t compare to th big cities to the general population.
‘Give Five’- Bennet’s program to increase 5 Yahal in Math
Globes’s article (link) wanted to test whether Bennet’s program to double the amount of students who are in 5 Yahal in Math worked as he claims. To do so they ploted this graph:As Globes shows, Bennet’s program is not necessarily the cause in the increase.
The program measured success according to the absolute number of students. Clearly they were not present at the ‘What Not To Do’ part of our course, because if they were they would know you should test improvement with percentages, as we will show in our first visualization.
District, Supervision, and sector
We remained with six geographic districts (South, center, Tel-Aviv, Jerusalem, Haifa, North), six sectors (Haredi, Dati, Jewish, Badawi, Arab, Druz) and three supervisions (General, Dati, Independent).
Coordinates
General cleaning
Why:
The ‘Give Five’ program was a major program for the Ministry of Education led by P.M. Bennet. According to the Ministry and Bennet, the program worked well, we want to verify this claim.
What:
A line chart visualizing the trend in number and precentage of students in 5 Yahal in Math and in English divided by sectors.
How:
Using a line chart we examine the change in number and percentage of students in 5 Yahal in Math. Additionally, we examine whether the change is similar in different sectors, or whether some sectors were affected more than others. Finally, we compare the change in math with the change in English to see if the program is the cause of the change.
Here is the link for the interactive app.
We see that similar to Globes’s finding, the increase in number of students in 5 Yahal has began before the program started (before 2015). Furthermore, a similar increase is observed in English 5 Yahal students, where no program was employed.
Why:
There are claims of inequality between the center and big cities of the country and the periphery, in jobs, salaries, health systems and education. This data and project gives us an opportunity to test this claim on the aspect of education through Bagrut grades and geographical information. We will examine if there is a visible difference between central cities and periphery areas in math grades in 2019.
What:
a map of Israel and color the point of each school by the average math grade. to simplify visibility we will divide the data to four groups-
How:
The map with scater plot on it shows the geographical positions of the school, by coloring them in positive and negative (green and red) colors, we could see areas which are colored more positive (i.e., above average areas) and areas which are colored more negative (i.e., below average areas).
The map shows that central areas (i.e., Gush-Dan and Jerusalem areas) are scattered with more green dots, whereas the periphery is scattered with more red dots, thus validating the claims of inequality in education, for math grade the least.
Why:
Ezrahut (Citizenship) is a subject that is learned the same in all sectors although it might be problematic for some sectors more than others. Does this effect the abilities of some sectors while giving an advantage to others?
What:
A bar plot comparing the best and worst schools by Ezrahut grades from 2019 (the latest year in our data). Comparison will be done by deviation from country average. The bars will be colored by sector.
How:
If all the best 5 schools are colored differently than worst 5, this will indicate of a major difference between sectors. on the contrast, if both the best and the worst are multicolored, it will indicate equality.
We see no differences between sectors, they appear the same within the best schools and the worst schools. Equality to all ;-)
Why:
We wanted to see whether there are trends in the Bagrut grades along the years. Also, we wanted to compare those trends across different sectors and districts.
What:
The graph demonstrates the mean grades over the years in Math or English along the different Yahal (the two most indicative subjects because they are learned similarly in all sectors with no special advantage to any sector). The trends are presented for each sector or district.
How:
With line charts we can identify trends along the years, and by coloring them differently for each sector or district we can compare them.
Here is the link for the interactive app.
In general, it seems that in most cases Jewish sectors (e.g., Jewish, Religious and Haredi) and central geographic districts, have higher mean grades. The worst grades are presented in south district and Bedouin sector, which are partly correlated.
In Math, we see an increase in the average grades, from 2013 to 2015 and from then a decrease is presented.
In English, for 4-5 Yahal there is an overall decrease, whereas at 3 Yahal the trends are moderately positive.
Why:
We thought it will be interesting to see if there is a general correlation between the abilities in language subjects by examining student’s grades in thier first language and their grades in English (a second language).
What:
We created a scatter plot to present the correlation between a student’s first language and their grades in English at the different Yahals.
On the X axis we present the first language (Hebrew for Jewish and Arabic for Arabs) and each sector (Jewish and Arab) is colored differently.
How:
A scatter plot is the best way to demonstrate a correlation between variables. This will allow us to compare the correlations of Jewish and Arab sectors between thier first language abilities and their performances in English.
Here is the link for the interactive app.
We see an overall weak correlations between these subject. Yet, the correlations in the Jewish sector are stronger, especially in 5 Yahal.
Why:
Continuing with searching for correlations, we also wanted to see if there are connections between the linguistic subjects for the Jewish sector. That is, if there will be a correlation between abilities in Hebrew and abilities in History or Literature, two subjects that are based on being good in Hebrew.
We also wanted to see if those correlations are consistent over the years.
What:
We created a scatter plot to present the correlation between the grades in Hebrew and the grades in either History or Literature. A selection between the years is also possible, in order to compare the correlations over the years.
How:
Once again, a scatter plot is the best way to demonstrate a correlation between variables. This allows us to compare the correlations of Hebrew-History and Hebrew-Literature over the years.
Here is the link for the interactive app.
The correlation between Hebrew and History is stronger than the correlation between Hebrew and Literature at all of the years, but the gap is getting smaller at 2018 and 2019.
Why:
We wanted to see whether there are, in general, subject who correlates with each other (besides those we already saw).
What:
We created a heatmap of the correlation between all the core subjects (chosen by subjects that are thought in all schools).
For subjects that are taught differently in different sectors we created one combined subject. For example, History is taught differently in different sectors, History in the heatmap represents a combination of Jewish history for Jewish schools, Arab history for Arabic schools and Druze history for Druze schools.
We present the average grade across the years.
How:
A heatmap can represent the correlations differences in the most interpetable way. The more the square is colored in strong blue, the higher the correlation. The more the square is colored in strong yellow, the lower the correlation.
*No negative correlations were found
We can see quite clearly that English and Sport are the most non-correlated subject and have the weakest connections to the other subjects, since the colors there seem lighter than at the others. Surprising exceptions can be seen at sport-Math and Sports-First language. Math and First language seem to have high correlations with all of the subjects besides English. The low correlations between First language and english is not surprising and match what we saw in another visualization even when separating to different first languages.
# setup ------------------------------------------------------------------------
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, comment = '',
message = FALSE, paged.print = FALSE,
dpi = 300, fig.width = 8, fig.height = 5,
fig.align = 'center')
if(!require(rmdformats)) install.packages("rmdformats",repos = "http://cran.us.r-project.org")## Loading required package: rmdformats
if(!require(tidyverse)) install.packages("tidyverse",repos = "http://cran.us.r-project.org")
if(!require(rstatix)) install.packages("rstatix",repos = "http://cran.us.r-project.org")
if(!require(ggplot2)) install.packages("ggplot2",repos = "http://cran.us.r-project.org")
if(!require(plotly)) install.packages("plotly",repos = "http://cran.us.r-project.org")
if(!require(ggpubr)) install.packages("ggpubr",repos = "http://cran.us.r-project.org")
if(!require(tidylog)) install.packages("tidylog",repos = "http://cran.us.r-project.org")
if(!require(shiny)) install.packages("shiny",repos = "http://cran.us.r-project.org")
if(!require(rgdal)) install.packages("rgdal",repos = "http://cran.us.r-project.org")
if(!require(sp)) install.packages("sp",repos = "http://cran.us.r-project.org")
if(!require(ggcorrplot)) install.packages("ggcorrplot",repos = "http://cran.us.r-project.org")
library(tidyverse)
library(rstatix)
library(ggplot2)
library(plotly)
library(ggpubr)
library(tidylog)
library(shiny)
library(rgdal)
library(sp)
library(ggcorrplot)
dat <- read.csv("df_csv.csv", header = TRUE, encoding = "UTF-8")
dat$subject <- as.character(trimws(dat$subject, which = c("both")))
dat$district <- as.character(trimws(dat$district, which = c("both")))
dat$sector <- as.character(trimws(dat$sector, which = c("both")))
dat$supervision <- as.character(trimws(dat$supervision, which = c("both")))
colnames(dat)[1] <- "City"
##### Number of students in 5 yahal math or english ---------------------------
dat1 <- dat
dat1$subject[dat1$subject == "מתמטיקה"] <- "Math"
dat1$subject[dat1$subject == "אנגלית"] <- "English"
x <- which(dat1$subject=="Math")
y <- which(dat1$subject=="English")
c <- c(x,y)
dat1 <- dat1[c,]
dat1$subject <- as.factor(dat1$subject)
dat1 <- dat1 %>%
filter(sector != "NA")## filter: no rows removed
server <- function(input, output, session) {
#Summarize Data and then Plot
data1 <- reactive({
req(input$subject)
req(input$type)
df <- if(input$type == "Absolute"){
dat1 %>%
filter(subject %in% input$subject) %>%
filter(yahal == 5) %>%
group_by(sector, grad_year) %>%
summarise(total = sum(students))
}
else if(input$type == "Percentage"){
dat1 %>%
filter(subject %in% input$subject) %>%
group_by(sector, grad_year) %>%
summarise(total = sum(students[yahal == 5]) / sum(students))
}
})
#Plot
output$plot <- renderPlotly({
ggplotly(
ggplot(data1(),
aes(x = grad_year, y = total, colour = sector))+
geom_line(size = 1.5) +
scale_colour_manual(values=c("יהודי" = "#0d0887",
"דתי" = "#6a00a8",
"חרדי" = "#b12a90",
"דרוזי" = "#e16462",
"בדואי" = "#fca636",
"ערבי" = "#f0f921"))+
theme_minimal()+
scale_x_continuous(breaks=c(2013, 2014, 2015, 2016, 2017, 2018, 2019))+
labs(y = "Number / Percentage of Students in 5 Yahal",
x = "Year of graduation")+
theme(legend.title=element_blank(),
plot.margin=unit(c(1,3,1,1),"cm")),
tooltip = c("x", "y")) %>%
layout(yaxis = list(fixedrange = TRUE),
xaxis = list(fixedrange = TRUE),
margin = list(r = 10, l = 20)) %>%
config(displayModeBar = FALSE)
})
}
ui <- basicPage(
h1("Number of students in 5 Yahal - math/english"),
selectInput(inputId = "subject",
label = "Choose subject",
list("Math", "English")),
selectInput(inputId = "type",
label = "Choose Information Type",
list("Absolute","Percentage")),
mainPanel(plotlyOutput("plot")),
h6("* For anonymization purposes, questionnaires with less than 11 students at the school do not appear in the data. Both the sum of students and percentage of students are callculated for questionnaires that 11 or more students took in a specific school"))
shinyApp(ui, server)## PhantomJS not found. You can install it with webshot::install_phantomjs(). If it is installed, please make sure the phantomjs executable can be found via the PATH variable.
# Map --------------------------------------------------------------------------
dat2 <- dat %>%
filter(sector != "NA",
grad_year == 2019,
subject == "מתמטיקה") %>%
group_by(school_id, school_name, school_x, school_y) %>%
summarise(school_mean = mean(grades)) %>%
ungroup(school_id, school_name, school_x, school_y) %>%
mutate(mean_all = mean(school_mean)) %>%
mutate(dev = scale(school_mean))## filter: removed 78,564 rows (98%), 1,603 rows remaining
## group_by: 4 grouping variables (school_id, school_name, school_x, school_y)
## summarise: now 763 rows and 5 columns, 3 group variables remaining (school_id, school_name, school_x)
## ungroup: no grouping variables
## mutate: new variable 'mean_all' (double) with one unique value and 0% NA
## mutate: new variable 'dev' (double) with 720 unique values and 0% NA
dat2$rank <- ifelse(dat2$dev > 1.5, "Higest grades", NA)
dat2$rank <- ifelse(dat2$dev > 0.5 & dat2$dev <= 1.5, "Grades above avg", dat2$rank)
dat2$rank <- ifelse(dat2$dev < -0.5 & dat2$dev >= -1.5,"Grades below avg", dat2$rank)
dat2$rank <- ifelse(dat2$dev < -1.5, "Lowest grades", dat2$rank)
dat2 <- na.omit(dat2)
dat2$rank <- factor(dat2$rank,levels = c("Higest grades",
"Grades above avg",
"Grades below avg",
"Lowest grades" ) )
##### Map
israel <- readOGR(
dsn= getwd() ,
)## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\אביב\Google Drive\Study\ויזואליזציה של מידע\Data_Vizualization", layer: "gadm36_ISR_1_area"
## with 7 features
## It has 1 fields
## Regions defined for each Polygons
## Regions defined for each Polygons
## [1] "factor"
SP <- rbind(is, wb)
SP$id[SP$id == 0] <- 4
map <- ggplot(SP, aes(long, lat, group = group)) +
geom_polygon(alpha = 0.5) +
coord_equal() +
geom_point(data = dat2, aes(x = school_x, y = school_y,color = rank,
text = paste0(school_name,
"\nMean Grade: ", round(school_mean,2))),
inherit.aes = F, alpha = 0.7, size = 0.9) +
scale_colour_manual(values=c("Higest grades" = "green",
"Grades above avg" = "green4",
"Grades below avg" = "firebrick4",
"Lowest grades" = "firebrick1"))+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(),axis.line = element_blank(),
axis.ticks.x = element_blank(), axis.text.x = element_blank(),
axis.ticks.y = element_blank(), axis.text.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
plot.title = element_text(hjust = 0.5))+
labs(title = "Map of schools in Israel by Math grades in 2019")+
annotate("text", x = 36.15000, y = 30.90000, label = paste0("Country Mean", round(dat2$mean_all,2)))## Warning: Ignoring unknown aesthetics: text
# Top-Bottom Citizenship grades ------------------------------------------------
dat3 <- dat %>%
filter(grad_year == 2019,
subject == "אזרחות",
yahal == 2) %>%
group_by(school_id) %>%
mutate(mean_school = mean(grades)) %>%
ungroup() %>%
mutate(mean_all = mean(grades),
dev = mean_school-mean_all) %>%
arrange(desc(mean_school)) %>%
filter(row_number() == 1:5 | row_number() > n()-5) %>%
mutate(type = case_when(dev > 0 ~ "Best",
TRUE ~ "Worst"))## filter: removed 79,354 rows (99%), 813 rows remaining
## group_by: one grouping variable (school_id)
## mutate (grouped): new variable 'mean_school' (double) with 701 unique values and 0% NA
## ungroup: no grouping variables
## mutate: new variable 'mean_all' (double) with one unique value and 0% NA
## new variable 'dev' (double) with 701 unique values and 0% NA
## Warning in row_number() == 1:5: longer object length is not a multiple of
## shorter object length
## filter: removed 803 rows (99%), 10 rows remaining
## mutate: new variable 'type' (character) with 2 unique values and 0% NA
ggplotly(
ggplot(dat3,
aes(x = reorder(school_name, dev), fill = sector, weight = dev, text = paste0("School Average: ", mean_school))) +
geom_bar() +
scale_fill_manual(values=c("יהודי" = "#0d0887",
"דתי" = "#6a00a8",
"חרדי" = "#b12a90",
"ערבי" = "#f0f921")) +
labs(x = "School Name", y = "Deviation from Country Average") +
coord_flip() +
theme_minimal() +
geom_hline(yintercept = 0, size = 2, color = "red") +
annotate("text", x = 9.5, y = -10,
label = paste0("Country Average:", round(dat3$mean_all[1],2)), color = "red"),
tooltip = "text") %>%
layout(yaxis = list(fixedrange = TRUE),
xaxis = list(fixedrange = TRUE)) %>%
config(displayModeBar = FALSE)# Sector/District Per-unit -----------------------------------------------------
dat4 <- dat
dat4$subject[dat4$subject == "מתמטיקה"] <- "Math"
dat4$subject[dat4$subject == "אנגלית"] <- "English"
x <- which(dat4$subject=="Math")
y <- which(dat4$subject=="English")
c <- c(x,y)
dat4 <- dat4[c,]
dat4$subject <- as.factor(dat4$subject)
dat4 <- dat4 %>%
filter(sector != "NA")## filter: no rows removed
server <- function(input, output, session) {
data4 <- reactive({
req(input$subject)
req(input$comparison)
req(input$yahal)
df <- if(input$comparison == "sector"){
dat4 %>%
filter(subject %in% input$subject, yahal %in% input$yahal) %>%
group_by(sector, grad_year) %>%
summarise(students = mean(grades)) %>%
rename(comp = 1)
}
else if(input$comparison == "district"){
dat4 %>%
filter(subject %in% input$subject, yahal %in% input$yahal) %>%
group_by(district, grad_year) %>%
summarise(students = mean(grades)) %>%
rename(comp = 1)
}
})
#Plot
output$plot <- renderPlotly({
ggplotly(
ggplot(data4(),
aes(x = grad_year, y = students, group = comp, color = comp,
text = paste0(comp,
"\nGraduation Year: ", grad_year,
"\nMean Grade: ", round(students,2))))+
scale_x_continuous(breaks=c(2013, 2014, 2015, 2016, 2017, 2018, 2019))+
geom_line(size = 1.5) +
scale_colour_manual(values=c("יהודי" = "#0d0887",
"דתי" = "#6a00a8",
"חרדי" = "#b12a90",
"דרוזי" = "#e16462",
"בדואי" = "#fca636",
"ערבי" = "#f0f921",
"צפון" = "#1a9850",
"חיפה" = "#91cf60",
"תל אביב" = "#d9ef8b",
"מרכז" = "#fee08b",
"ירושלים" = "#fc8d59",
"דרום" = "#d73027"))+
theme_minimal()+
theme(legend.title=element_blank())+
labs(y = "Mean Grade",
x = "Year of graduation"),
tooltip = c("text")) %>%
layout(yaxis = list(fixedrange = TRUE),
xaxis = list(fixedrange = TRUE)) %>%
config(displayModeBar = FALSE)
})
}
ui <- basicPage(
h1("Mean Grades pre Sector - math/english"),
selectInput(inputId = "subject",
label = "Choose subject",
list("Math", "English")),
selectInput(inputId = "comparison",
label = "Choose comparison variable",
list("district", "sector")),
selectInput(inputId = "yahal",
label = "Choose number of yahal",
list("3", "4", "5")),
mainPanel(plotlyOutput("plot")))
shinyApp(ui, server)# First Lenguage - English -----------------------------------------------------
dat5 <- dat %>%
select(c('school_id', 'yahal', 'grad_year', 'subject', 'grades', 'sector'))## select: dropped 9 variables (City, students, school_name, district, supervision, …)
dat5$subject[dat5$subject == "אנגלית"] <- "English"
dat5$subject[dat5$subject == "הבעה עברית"] <- "Hebrew"
dat5$subject[dat5$subject == "ערבית לערבים"] <- "Arabic"
dat5$subject[dat5$subject == "ערבית לדרוזים"] <- "Arabic"
english_dat <- dat5[which(dat5$subject=="English"),]
colnames(english_dat)[4] <- "english"
colnames(english_dat)[5] <- "english_grades"
hebrew_dat <- dat5[which(dat5$subject=="Hebrew"),]
colnames(hebrew_dat)[4] <- "first_language"
colnames(hebrew_dat)[5] <- "first_language_grade"
arabic_dat <- dat5[which(dat5$subject=="Arabic"),]
colnames(arabic_dat)[4] <- "first_language"
colnames(arabic_dat)[5] <- "first_language_grade"
dat5 <- rbind(hebrew_dat, arabic_dat)
dat5 <- merge(x = dat5, y = english_dat,
by = c("school_id", "sector", "grad_year"),
all.x = FALSE, all.y = TRUE)
dat5 <- na.omit(dat5)
server <- function(input, output, session) {
data5 <- reactive({
req(input$yahal)
df <- dat5 %>%
filter(yahal.y %in% input$yahal)
})
#Plot
output$plot <- renderPlotly({
ggplotly(
ggplot(data5(),
aes(x = first_language_grade, y = english_grades, color = first_language))+
geom_point(alpha = 0.5)+
stat_cor(aes(label = paste("r = ", ..r.., sep = " ")), label.x = 50, label.y = c(98,95))+
geom_smooth(method = lm)+
scale_color_manual(values=c("#fcd225", "#0d0887"))+
theme_minimal()+
labs(x = "First Language Mean Grades",
y = "English Mean Grades")+
theme(plot.margin=unit(c(1,1,1,1),"cm")),
tooltip = c("x", "y")) %>%
config(displayModeBar = FALSE)
})
}
ui <- basicPage(
h1("Correlation between abilities in a first language and in a second language"),
selectInput(inputId = "yahal",
label = "Choose how many English yahal",
list("3", "4", "5")),
mainPanel(plotlyOutput("plot")))
shinyApp(ui, server)# Hebrew - History/Literature correlation --------------------------------------
dat6 <- dat
dat6$subject[dat6$subject == "הבעה עברית"] <- "Hebrew"
dat6$subject[dat6$subject == "הסטוריה"] <- "History"
dat6$subject[dat6$subject == "ספרות"] <- "Literature"
dat6 <- dat6[dat6$subject == "Hebrew"|
dat6$subject == "History"|
dat6$subject == "Literature",]
dat6 <- dat6 %>%
filter(yahal == 2) %>%
select(c('school_id', 'grad_year', 'subject', 'grades'))## filter: removed 834 rows (7%), 10,805 rows remaining
## select: dropped 11 variables (City, students, yahal, school_name, district, …)
dat6 <- reshape(dat6, idvar = c("school_id", "grad_year"),
timevar = "subject", direction = "wide")
dat6 <- na.omit(dat6)
colnames(dat6)[3] <- "Hebrew"
colnames(dat6)[4] <- "History"
colnames(dat6)[5] <- "Literature"
server <- function(input, output, session) {
data6 <- reactive({
req(input$subject)
req(input$Year)
df <- if(input$subject == "History"){
dat6 %>%
filter(grad_year %in% input$Year) %>%
select(c('school_id', 'Hebrew', 'History')) %>%
rename(Grades = 3)
}
else if(input$subject == "Literature"){
dat6 %>%
filter(grad_year %in% input$Year) %>%
select(c('school_id', 'Hebrew', 'Literature')) %>%
rename(Grades = 3)
}
})
#Plot
output$plot <- renderPlotly({
ggplotly(
ggplot(data6(),
aes(x = Hebrew, y = Grades))+
geom_point(alpha = 0.7)+
stat_cor(aes(label = paste("r = ", ..r.., sep = " ")), label.x = 50, label.y = 82)+
geom_smooth(method = lm)+
scale_colour_viridis_d(option = "plasma", direction = 1)+
theme_minimal()+
labs(x = "First Language Mean Grades",
y = "Language-Rich Subjects (History / Literature)")+
theme(plot.margin=unit(c(1,1,2,1),"cm")),
tooltip = c("x", "y")) %>%
config(displayModeBar = FALSE)
})
}
ui <- basicPage(
h1("Correlation between Hebrew for jewish and History/Literature"),
selectInput(inputId = "Year",
label = "Choose a graduation year",
list("2013", "2014", "2015", "2016", "2017", "2018", "2019")),
selectInput(inputId = "subject",
label = "Choose subject",
list("History", "Literature")),
mainPanel(plotlyOutput("plot")))
shinyApp(ui, server)# Subject correlations ---------------------------------------------------------
dat7 <- dat
dat7$subject[dat7$subject == "הבעה עברית"] <- "First language"
dat7$subject[dat7$subject == "ערבית לערבים"] <- "First language"
dat7$subject[dat7$subject == "ערבית לדרוזים"] <- "First language"
dat7$subject[dat7$subject == "הסטוריה"] <- "History"
dat7$subject[dat7$subject == "הסטוריה לבי'ס דרוזי"] <- "History"
dat7$subject[dat7$subject == "הסטוריה לבי'ס ערבי"] <- "History"
dat7$subject[dat7$subject == "ספרות"] <- "Literature"
dat7$subject[dat7$subject == "אנגלית"] <- "English"
dat7$subject[dat7$subject == "מתמטיקה"] <- "Math"
dat7$subject[dat7$subject == "אזרחות"] <- "Citizenship"
dat7$subject[dat7$subject == "חנוך גופני"] <- "Sport"
sub <- c("Math","English","Literature","History","First language","Citizenship", "Sport")
dat7 <- dat7 %>%
filter(subject %in% sub) %>%
group_by(school_id, subject) %>%
get_summary_stats(grades, type = "mean") %>%
select(c('school_id', 'subject', 'mean'))## filter: removed 39,350 rows (49%), 40,817 rows remaining
## group_by: 2 grouping variables (school_id, subject)
## select: dropped 2 variables (variable, n)
dat7$subject <- as.factor(dat7$subject)
heat_dat <- pivot_wider(dat7, names_from = "subject",
values_from = "mean")## pivot_wider: reorganized (subject, mean) into (Citizenship, English, First language, History, Literature, …) [was 4878x3, now 884x8]
## select: dropped one variable (school_id)
cormat <- cor(heat_dat)
ggplotly(
ggcorrplot(cormat) +
scale_fill_gradient(limit = c(0,1), low = "yellow", high = "blue"),
tooltip = c("value")) %>%
config(displayModeBar = FALSE)## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.